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(57) Abstract 

A speech recognition system combines content analysis of a signal with directional analysis of the signal to calculate probable contents 
of the signal and the probability that the contents are as calculated. Content analysis can be combined with independent source direction 
analysis, consisting of the probable direction of the source and the probability that the source is as determined, to refine the content analysis. 
The direction analysis can affect either the calculation of the probable contents of the signal or the calculation of the probability that the 
contents are as calculated. The improved information obtained by combining the content analysis with directional analysis can be used to 
improve subsequent content analyses or directional analyses. 
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APPARATUS AND METHOD FOR SPEECH 
RECOGNITION USING SPATIAL INFORMATION 

Fip|ri nf the Invention 

This invention relates to the recognition of intelligent information in sig- 
nals. More particularly, it relates to speech recognition in environments that con- 
tain interference. 
Background of the Invention 

Automatic speech recognition systems typically perform poorly when 
operating in noisy environments, where they must distinguish between speech 
("target'') signals and noise (interference). If the acceptance criteria are adjusted 
so that a high proportion of target signals are accepted, then many of the interfer- 
ing noise signals are also accepted. Consequently, the false-alarm rate is high. On 
the other hand, if the acceptance criteria are adjusted to achieve a low false-alann 
rate, then many target signals are rejected. Consequently, the target acceptance 
rate is low. Thus, no matter how the acceptance criteria are selected, performance 
is degraded by the presence of the interference. 

Prior art attempts to solve this problem have been based on increasing the 
ratio of target signal power to interference signal power. This is accomplished typ- 
ically by use of one or more of the following three methods. First, input signals 
can be filtered to attenuate frequencies not contained in the spectra of the target 
signals. Although generally helpful, this method is unsatisfactory in noisy envi- 
ronments. 

Second, the microphones can be located close to the source of the target 
signals to increase the relative strength of these signals. This method is useful only 
under limited conditions. It is not effective if the target signal moves or if there are 
a number of target sources at different locations. Even for a single, fixed source, 
this method may be unacceptable. For example, the target source may be a person 
who does not wish to wear a microphone headset. Or, it may be desired that the 
target not be aware of the signal acquisition system. 

A third method is to employ a microphone system that is highly directional, 
oriented so that it is most sensitive in the direction of the target signals Htnvever, 



this method requires multiple microphones, placed at different locations. This 
requires the use of a substantial amount of space, particularly if the system is to be 
highly directional at low frequencies. Also, movement of the target source in such 
a system will degenerate performance of the speech recognition system because 
5 the directional pattern of the microphones causes the received signal to vary with 
respect to the position of the target. As a result, movement of the target source 
causes changes in the received signal that cannot be distinguished from changes in 
the target signal. This distortion of the received signal occurs even without any 
noise. 

10 It is therefore an object of the present invention to provide an improved 

speech recognition system. 

It is another object of the present invention to provide a speech recognition 
system with improved performance in environments that contain interference. 

It is a further object of the present invention to provide a speech recogni- 
15 tion system with improved performance in environments where the target source * 
moves. 

Summarv of the Invention 

According to the present invention, these and other objects and advantages 
20 are achieved by employing a diversified microphone system to obtain information 
on the direction of the target sound source (a "directional signature"). This direc- 
tional information is combined with the content information on which automatic 
speech recognition is customarily based. 

The system employs two subsystems. The first includes a microphone sys- 
25 tem and associated signal processing to determine the direction (in terms of the 
angular coordinates, 9 and <I>) of the sound source. The second combines ihis 
directional information with the content information to determine whether the 
received signal should be accepted as a target signal or rejected as interfercnce. 
The directional information provided by the first subsystem can be used lo exclude 
30 interfering signals arising from non-target directions with a low probability that 
desired target signals will be excluded. The directional information also permits 
signals for which the content is less certain to be accepted as long as they are very 
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likely to come from a target source. 

Rrip.f Description of the Drawings 

Figure 1 is a block diagram of a first embodiment of the speech recognition 
system of the present invention. 

Figure 2 is a block diagram of a second embodiment of the speech recogni- 
tion system of the present invention. 

nfttailed Descripti nn of the Invention 

The present invention is for an apparatus and method for speech recogni- 
tion using spatial information to reduce the effects of interference. 

Referring to Figure 1, time- varying signals Si(t) - Sn(t) are input to micro- 
phones 20a - 20n. Preferably, microphones 20a-20n are omnidirectional micro- 
phones or have a wide acceptance angle. These signals are then transmitted to 
1 5 content analyzer 24 and source direction analyzer 28. 

Content analyzer 24 is a conventional speech recognition subsystem, which 
can include dedicated hardware or a specially programmed computer, as is well 
known in the art. For example, the "Dragon Dictate" system by Dragon Systems 
of Newton, Massachusetts is suitable for the systems of the present invention. 
20 Content analyzer 24 provides an output M(t), on line 30, M(t) is the estimated 

most probable message, along with a measure of the associated probability or con- 
fidence rating. M(t) may represent the utterances judged to be the most likely to 
have been transmitted out of the set of utterances in the system's vocabulary. In a 
preferred embodiment, M(t) at each time interval consists of a series of utterances 
25 and probabilities [(Mi, mi), (M2, m^), ... (Mj^, mj^)] in order of decreasing probabil- 
ity or confidence, where the M/ are the utterances and the m, are the corresponding 
probabilities, where nij >m2^ ... > mj^ and 0 < m,- < 1 for all /. 

Source direction analyzer 28 provides an output D(t) on line 32. Where 
there is assumed to be only one source target signal, D(t) is the estimated most 
30 probable source direction (9, O), along with a measure of the associated probabil- 
ity or confidence. In a preferred embodiment, D(t) at each time in ter\'al consists of 
a series of source directions and probabiUties [(£>i, ^i), (I>2, di), ... in 



order of decreasing probability or confidence, where the are the source direc- 
tions and the rf/ are the corresponding probabilities, where c?/ > rf^ - *♦ - ^it 0 < 
< 1 for all /. Where there may be multiple target sources, each item in the series 
could include multiple source locations. 
5 The probability or confidence level associated with the accuracy of the 

identification of an utterance (the M(t) function) is normally derived from an eval- 
uation of the dissirailatory of two feature vectors. This technique is well known in 
the art and its application is described in the book Fundamentals of Speech Recog- 
nition, by L. Rabiner and B.H. Juang (Prentice- Hall, 1993). 

10 The probability measure associated with the source direction (the D(t) 

function) can be derived from a histogram record of directions associated with the 
source for previous utterances recognized by the speech recognition system. For 
example, the azimuth (9) associated with each arriving utterance is determined, 
and a record is maintained of the azimuth associated with each of the last 40 utter- 

15 ances meeting minimum probabilities. The distribution of the azimuths forms a 
probability density function. Then, for example, if the measured direction of the 
source for the most recent utterance has been associated with 25% of the 40 utter- 
ances for which records are maintained, a probability or confidence level of 0.25 
would be assigned to that azimuth value. 

20 The histogram that is, the record of measured directions and counts — is 

continually updated as new utterances are accepted by the system and the data for 
the oldest utterances is dropped. Consequently, the histogram changes in a manner 
that adapts to changes in the position of the source. For example, if a number of 
recent utterances arise from azimuth angles to the right of the centroid of the prob- 

25 ability density function, which would correspond to movement of the source to the 
right, the adaptive process causes the centroid to shift to the right 

The probability determination can be modified by adding a weighting func- 
tion based on the age of each of the records {e.g., give more weight to the most 
recent records) or some other factor appropriate to known characteristics of the 

30 source. Preferably, the weighting function is derived from prior knowledge of the 
probable locations and movements of the source. This prior knowledge can be 
based, for example, on physical constraints such as the location of the speaker's 
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chain or on information obtained during the phonetic training period normally 
required by a speech recognition system so that it can take into account the individ- 
ual characteristics of the speaker. 

The source direction where there is only one source location active at any 

5 one time can be determined for one angular dimension 9 by using two micro- 
phones, which have inputs Si and S2. In a preferred embodiment, Si(t) will be 
found to be attenuated and delayed with respect to 82(1), or vice versa. Values of a 
and X are determined to minimize the difference signal, Z(t) = S2(t) - aSi(t-x). In 
the absence of noise, values for a and t can be selected for which Z(t) will be zero 

10 for all times 

This can be accomplished, for example, with an array of delay lines, where 
each delay line represents a different delay (x) to the signal Si(t), and an array of 
attenuation lines, where each attenuation line represents a different attenuation (a) 
of the same signal, Si(t). A matrix is formed, consisting of each combination of a 
15 delay and an attenuation of the signal Si(t) (i.^., aSi(t-x), for different values of a 
and X). Each combination is compared to the other signal, S2(t). The comparison 
that yields the smallest difference between the signals (i.e., the minimum value for 
Z(t)) determines the appropriate values for a and x. The conditions under which 
the system will be used will determine the appropriate number of delay and attenu« 
20 ation lines needed to obtain sufficient accuracy in the determination of a and x. 

The direction of the source is then determined by converting (a, x) to 9, as 
is well known in the art. This method can be applied where n source locations may 
be active at a given time by employing n+l microphones and determining values 
for ai-a„ and x^-x^ to minimize Z(t) for all times t. 
25 Alternatively, the source direction can be determined without the use of 

delay lines by measuring the delay x between signal onsets in the two channels and 
using a comparator to determine the difference in amplitude of the signal onsets; 
by using microphones with different directional patterns and measuring the ampli- 
tude difference in the two channels: by measuring the correlation coefficient 
30 between the two channels after first whitening the spectra of the signals; or by 
other methods known in the an. These methods also can be applied where more 
than one source location may be active at a given time. 
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The outputs M(t) on line 30 and D(t) on line 32 are input to and processed 
in processor 36. The signals M(t) consist of pairs [(M, , mj ), (M2, m2), . . (M^, nif,)] 
for each time segment. The probability m that the received signal arises from one 
of the utterances M, can be represented by: 

5 

k 

m = ^ m- 
/ = 1 

and the probability that the received signal does not arise from one of the utter- 
ances Mj is given by m* = 1-m. 

10 Where there is a single signal source, the directional information D(t) can 

be represented by the total probability d that the source is a target source. That is, 
having determined the direction, the source direction analyzer 28 provides a prob- 
ability d th&i the source is a target. The probability d is independent of the content 
information M. In this situation, the output Q(t) of processor 36, on line 38, is the 
set of pairs [(M ,,91), (Mj, ^2). ••• i^h ^k)] for each time segment, where = m, J. 
Thus, by combining the directional information D(t) with the content information 
M(t), the probability of the signal resulting from target message M, is m^rf instead 
of m,. The probability of the signal resulting from some target message is: 



15 



20 



25 



30 



m' = m- =dm = d(l-m*) 

I = 1 

and ti»e probability of the signal not resulting from some target message is given by 
m'*=\-dm. 

As shown, the addition of the directional information D(t) decreases the 
probability that any signal will be regarded as a target message because probabili- 
ties based on both content and direction are considered. However, Uiis effect can 
be offset by lowering the numerical threshold of acceptance of an utterance, a 
quantity that is or can readily be made accessible to the user in the software of 
speech recognitioii systems. 

Numerous methods may he employed to detemiine the probability that the 
source is a target source from the directional information D(t). In a first method, a 
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priori information on the position of the target source is utilized. In one embodi- 
ment, the system would assume that the mean target position is straight ahead 
(0-0)^ SQ jliat d-i for 9=0, and that the probability of the target being at any other 
angle decreases exponentially with angle (that is, d = e'^^, where ^ is a constant). 
5 In a second method, ^(9) is constructed during training of the recognition 

system. For example, where the system is trained in a quiet environment, a histo- 
gram of target values 9 is accumulated and used to determine the function d(B). In 
a preferred embodiment using this method, the disuribution of 9 for interference 
also is estimated during training and included in the function J(0). 
10 In a third method, d(Q) is constructed by exploiting the message content 

information M(t) during real-time use of the system. For example, for each 
received signal, the source direction angle 9 and probability m that the received 
signal arises from one of the utterances is used to develop a histogram for accept- 
able source direction angles 9. The histogram is a representation of a list of 
15 ordered pairs (9,, m,) in which each pair consists of an angle and a probability, and 
in which totals have been sorted by 9. Since the histogram is derived from actual 
data, it can be used to replace a gaussian or other nominal statistical distribution. 

The histogram is used to form d(9), which processor 36 in turn uses to 
obtain the output Q(t). Preferably, this modified content information Q(t) is fed 
20 back to source direction analyzer 28 to improve the directional information. For 
example, the modified content information Q(t) can be used to modify the direc- 
tional probability function, rf(9). This method can be used where initial values for 
d{Q) are formed from a different method. The modified content information Q(t) 
also can be fed back to content analyzer 24 to improve the initial message content 
25 information M(t). For example, the modified content information Q(t) can be used 
to modify the utterance probability function. 

In addition to improving the message content information M(t), the direc- 
tional information can be used to permit the voice recognition system to distin- 
guish between sources as long as the sources generally do not overlap. For 
30 example, where there are two sources that generally do not overlap, there would be 
two direction functions c?i(9) and 6/2(9), for which there generally are no angles 9 
for which both di{B) and dii^) are large. With these functions, the system could 
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determine which source supplied an utterance. Or, the determination of the most 
probable message could depend on the direction. 

Alternatively, the directional information can be integrated with the content 
information in other ways. In one embodiment, the estimated direction 0 is coded 
5 into a signal N(t), which is appended to the signals Si(t) at the input to content ana- 
lyzer 24. Preferably, directional segment N(t) is attached as soon as the directional 
information is available, as an interruption or a change in the utterance waveform. 
This transforms the directional information into content information, and the sig- 
nal is processed without the need for processor 36 to combine directional informa- 

10 tion with the processed content information. That is, all of the content processing 
takes place in content analyzer 24. 

In one embodiment, direction signal N(t) is appended as a suffix to the sig- 
nals Si(t). Whenever the angle 9 is outside a previously defined range of accept- 
able angles, the value of N(t) is such that content analyzer 24 will reject the signal 

15 as not corresponding to a recognized utterance within the system's vocabulary. 

An alternative embodiment for integrating the directional information with 
the content information is shown in Figure 2. Time- varying signals Si(t) - S„(t) 
are input to microphones 50a - 50n. These signals are then transmitted to direction 
analyzer 54 and mixer 56. Direction analyzer 54 provides an output on line 58, in 

20 the manner described above, that reflects the probability that the source signals 
originate with a desired target source. Line 58 provides the input to noise genera- 
tor 60, which generates an amount of noise inversely proportional to the probabil- 
ity that the source signals originate with the desired target source. That is, the 
noise level increases as the probability that the signal arises from the desired target 

25 source decreases. 

The noise signal is output on line 62, which connects to mixer 56. In mixer 
56, the signals SjCt) - S^it) are mixed with the noise signal from noise generator 
60, and then input to content analyzer 64. Content analyzer 64 outputs message 
information Q(t) on hne 66. An increase in the noise on the signal input to content 

30 analyzer 64 decreases the likelihood that the utterance will be recognized and/or 
the confidence level in the accuracy of the utterance determination. Thus, repre- 
sented as noise, the direction information imposes a constraint on the content ana- 
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lyzing portion of the system. 

In a simplified version of the embodiment shown in Figure 2, the direction 
information would be used to either include or exclude the source signals Si(t) - 
Sn(t). In this embodiment, the directional information is used independently of the 
5 content information. If the probability that the signal originated with a desired tar- 
get fell below a predetermined threshold, the system would reject the signal. If the 
probability exceeded the threshold, the signal would be input to the content ana- 
lyzer unchanged. 

While there have been shown and described examples of the present inven- 
10 tion, it will be readily apparent to those skilled in the art that various changes and 
modifications may be made therein without departing from the scope of the inven- 
tion as defined by the appended claims. Accordingly, the invention is limited only 
by the following claims and equivalents thereto. 

What is claimed is: 

15 



20 



25 



30 
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Claims: 

1 . A recognition system for a source signal comprising: 
input means for obtaining at least one input signal from the source signal; 
a content analyzer having a content input coupled to the input means, a con- 
tent output, and a content processor for calculating and providing to the content out 
put a content signal representative of an initial calculated probable value of the 
source signal; 

a direction analyzer having a direction input coupled to the input means, a 
direction output, and a direction processor for calculating and providing to the direc 
tion output a direction signal representative of a calculated source direction of the 
source signal; and 

a message processor having a first input coupled to tiie content output, a sec- 
ond input coupled to the direction output, a message output, and a message proces- 
sor for calculating and providing to Uie message output a message signal 
representative of a supplemental calculated probable value of the source signal. ^ 

2. The recognition system as in claim 1 , wherein the content processor 
furtiier calculates and provides to the content output a content probability signal rep- 
resentative of a confidence level in the initial calculated probable value. 

3. The recognition system as in claim 2, wherein the direction processor 
further calculates and provides to the direction output a direction probability signal 
representative of a confidence level in the calculated source direction. 

4. The recognition system as in claim 3, wherein the message processor 
further calculates and provides to the message output a message probability signal 
representative of a confidence level in the supplemental calculated probable value of 
the source signal. 

5. The recognition system as in claim 4, wherein the input means 
includes at least one microphone. 

6. The recognition system as in claim 4, wherein the message processor 
calculates the message probability signal by multiplying a value of the direction 
probability signal by a value of the content probability signal. 

7. The recognition system as in claim 4, wherein the message signal is 
substantially the same as the content signal. 
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8. The recognition system as in claim 4, wherein the direction processor 
uses prior calculations of the source direction to calculate the direction probability 
signal. 

9. The recognition system as in claim 4, wherein the direction analyzer 
further includes a feedback input connected to the message output, and wherein the 
direction processor uses prior values of the message signal and the message proba- 
bility signal to calculate the direction probability signal. 

10. The recognition system as in claim 4, wherein the content analyzer 
further includes a feedback input connected to the message output, and wherein the 
content processor uses prior values of the message signal and the message probabil- 
ity signal to calculate the content probability signal. 

1 1 . The recognition system as in claim 4, wherein the message processor 
calculates the message signal from the content signal and the direction signal. 

12. The recognition system as in claim 4, wherein the message processor 
calculates the message signal from the content signal, the direction signal, and the 
direction probability signal. 

13. A recognition system for a source signal comprising: 

input means for obtaining at least one input signal for the source signal; 

a direction analyzer having a direction input coupled to the input means, a 
direction output, and a direction processor for calculating a probability that the 
source signal originates from one of at least one predetermined direction and provid- 
ing to the direction output a direction probability signal representative of the proba- 
bility; 

a noise generator having an input coupled to the direction output and a noise 
output, for providing a noise signal inversely proportional to the probability; 

a mixer having a first mixer input coupled to the input means, a second mixer 
input coupled to the noise output, and a mixer output, for combining signals on the 
first mixer input and the second mixer input; and 

a content analyzer having a content input coupled to the mixer output, a con- 
tent output, and a content processor for calculating and providing to the ci>ntent out- 
put a content signal representative of a calculated value of the source signal. 

14. The recognition system as in claim 13, wherein the content processor 
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further calculates and provides to the content output a content probability signal rep 
resentative of a confidence level in the calculated value of the source signal. 

15. The recognition system as in claim 14, wherein the input means 
includes at least one microphone. 

16. A method for converting a source signal to a symbolic signal com- 
prising the steps of: 

obtaining at least one input signal from the source signal; 
calculating from the at least one input signal a probable source direction of 
the source signal; 

generating from the calculated probable source direction a direction signal; 
appending the direction signal to the input signal; 

accepting the appended input signal if the appended input signal corresponds 
within a predetermined degree to one of a plurality of predetermined acceptable sig- 
nals; 

rejecting the appended input signal if the appended input signal does not cor- 
respond within the predetermined degree to one of the plurality of predetermined 
acceptable signals; and 

generating from an accepted appended input signal a signal symbolic of a >: 
probable value of the source signal. 

17. The method for converting a source signal as in claim 16, wherein the 
direction signal generating step includes generating a first direction signal if the cal- 
culated probable source direction is within a predetermined range of acceptable 
directions and generating a second direction signal if the calculated probable source 
direction is not within the predetermined range. 

18. The method for converting a source signal as in claim 17, wherein the 
rejecting step includes rejecting an appended input signal if the appended input sig- 
nal includes the second direction signal. 

19. A method for converting a source signal to a symbolic signal com- 
prising the steps of: 

obtaining at least one input signal from the source signal; 
calculating from the at least one input signal a probable source direction of 
the source signal; 
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generating from the calculated probable source direction a direction signal; 

accepting the input signal if the input signal corresponds within a predeter- 
mined degree to one of a plurality of predetermined acceptable signals and the direc- 
tion signal is within a predetermined range of acceptable direction signals; and 

generating from an accepted input signal a signal symbolic of a probable 
value of the source signal. 

20. A method for converting a source signal to a symbolic signal com- 
prising the steps of: 

obtaining at least one input signal from the source signal; 

generating from the at least one input signal a content signal representative of 
a calculated probable value of the source signal; 

generating from the at least one input signal a direction signal representative 
of a calculated probable direction of the source signal; and 

generating from the content signal and the direction signal a message signal 
including a message content component symbolic of a calculated probable value of 
the source signal. 

21. The method for converting a source signal as in claim 20, wherein the 
content signal generating step includes generating a first content component sym- 
bolic of a probable value of the source signal and a second content component repre- 
sentative of a confidence level in the probable value. 

22. The method for converting a source signal as in claim 2 1 , wherein the 
direction signal generating step includes generating a first direction component rep- 
resentative of the probable source direction and a second direction component repre- 
sentative of the confidence level in the probable source direction. 

23. The method for converting a source signal as in claim 22, wherein the 
message signal generating step further includes generating from the second content 
component and the second direction component a message probability component 
representative of a confidence level in the message content component 

24. The method for converting a source signal as in claim 22, wherein the 
message signal generating step includes multiplying a value of the second content 
component by a value of the second direction component. 

25. A method for converting a source signal to a symbolic signal com- 
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prising the steps of: 

obtaining at least one input signal froni the source signal; 

calculating a probability that the source signal originates from one of at least 
one predetermined directions; 

generating a noise signal inversely proportional to the calculated probability; 

mixing the at least one input signal with the generated noise signal; and 

generating from the mixed signal a content symbol representative of a proba- 
ble value of the source signal. 

26. A method for converting a source signal to a symbolic signal com- 
prising the steps of: 

obtaining at least one input signal from the source signal; 

generating from the at least one input signal a direction signal representative 
of a calculated probable direction of the source signal; and 

generating from the at least one input signal and the direction signal a con- 
tent signal representative of a calculated probable value of the source signal. 
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